This document contains the totality of the analysis I am carrying out on the first experiment of my first Qualifying Paper towards the PhD in Linguistics at Stanford University. This project deals with the processing of gendered and gender-neutral personal and professional titles. This is also being developed as my analysis portion of the final project in Judith Degen’s Methods in Psycholinguistics class (LING245B).
We require the following libraries, and I’ve included their functions in my analysis below:
library(ggplot2)
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ──────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──
✓ tibble 3.1.2 ✓ dplyr 1.0.6
✓ tidyr 1.1.3 ✓ stringr 1.4.0
✓ readr 1.4.0 ✓ forcats 0.5.1
✓ purrr 0.3.4
── Conflicts ─────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
library(lme4)
Loading required package: Matrix
Attaching package: ‘Matrix’
The following objects are masked from ‘package:tidyr’:
expand, pack, unpack
library(stringr)
library(languageR)
library(lmerTest)
Attaching package: ‘lmerTest’
The following object is masked from ‘package:lme4’:
lmer
The following object is masked from ‘package:stats’:
step
library(reshape2)
Attaching package: ‘reshape2’
The following object is masked from ‘package:tidyr’:
smiths
source("helpers.R")
source("merge_results.R")
[1] "Not enough arguments supplied!"
ggplot2: data visualization
tidyverse: data management & manipulation
lme4: mixed-effects models
stringr: needed to compute string lengths
We also add “helpers.R” as a dependency, which includes custom code for implementing error bars in ggplot2 visualizations.
This just makes it so that all ggplot2 plots will have the black & white theme:
theme_set(theme_minimal())
Rscript merge_results.R gender_processing_a_selfpaced_reading_time_study_merged.csv mixed gender_processing_followup_demographics_merged.csv republican
This reads in our .csv file with all 37,800 data points logged during the experiment.
all_data <- read.csv('merged_all.csv') %>%
select(-error)
all_data %>%
head()
However, we definitely don’t want all 37,800 data points! Many of these aren’t critical trials, and this still includes the example trial where participants were learning how to take the experiment. In this section we filter out:
Once we do this, we are left with a total of 3,720 data points!
This part is a bit ugly, so I’ve collapsed each of these into a separate tab, but absolutely free to explore if you’d like! They are (hopefully) thoroughly annotated.
This simply filters out any data point which has the trial_id ‘example’, which is all 8 data points of the example trial, for each of the 200 participants.
all_data <- all_data %>%
filter(trial_id!= 'example')
all_data %>%
select(workerid,rt,trial_id)
Similar to the previous tab (Removing Example Trials), this code removes from our data frame any non-critical data points. In this case, we are interested on the reading times on the personal and professional titles, so these have been logged as the ‘critical’ trials in the experiment. This code gets rid of everything else.
all_data <- all_data %>%
filter(region=='critical')
all_data %>%
select(workerid,rt,region)
This is the most complicated of the removal steps, so let’s break it down.
First, we need to create a data frame with the workerids of every participant who did not meet the 85% accuracy threshold on the attention checks. The attention checks (logged as response_correct) have binary values: a 1 means that they got the question correct, while a zero (0) indicates that they did not.
So, we create a data frame with the participants who didn’t meet the threshold by grouping all the data by participant, and then creating a small data frame with just the workers and their relative accuracies, which is recorded in the new accuracy column. This is valued by meaning each participant’s response_correct scores:
exclusion <- all_data %>% group_by(workerid,system.Browser) %>%
summarise(accuracy = mean(response_correct))
`summarise()` has grouped output by 'workerid'. You can override using the `.groups` argument.
exclusion
Now we can add to this another column, called exclude, which will assign them a value of ‘Yes’ if their data needs to be excluded based on their accuracy. If they scored over 85% accuracy, then they do not need to be excluded, so their exclude value is ‘No’.
exclusion <- exclusion %>%
mutate(exclude = ifelse(accuracy < 0.85,'Yes','No'))
exclusion
Now we want to just get a list of all the participants who have been assigned a ‘Yes’ value, in order to eventually remove these participants from all the data. This code does that.
exclusion = exclusion %>%
filter(exclude == 'Yes') %>%
select(workerid,system.Browser)
exclusion
Now, finally, we can exclude these participants! We do this by redefining all_data as itself, but without those rows which have workerid values appear in the list we just made in the above step!
all_data <- all_data[!(all_data$workerid %in% exclusion$workerid),]
all_data %>%
select(workerid,rt,trial_id)
If we want to check that the right number was subtracted, we can get the length of the first list we made (exclusion) and see if that matches the difference between the number of unique participants in our new all_data data frame and the original one (which had 200 to begin with). To do this we get the length of the list of unique worker ids in the all_data data frame, and subtract it from 200. Then we compare this to the length of the exclusion list.
298 - length(unique(all_data$workerid))
[1] 19
length(unique(exclusion$workerid))
[1] 19
If we’re being really extra, we can also run this as an equivalency to get a boolean True/False value:
298 - length(unique(all_data$workerid)) == length(unique(exclusion$workerid))
[1] TRUE
We can also filter out any trials which fall without 2.5 standard deviations for that trial.
all_data <- all_data %>%
group_by(trial_id) %>%
mutate(id_mean = mean(log(rt))) %>%
mutate(exclusion = (log(rt) < mean(log(rt)) - 2*sd(log(rt))|(log(rt) > mean(log(rt)) + 2*sd(log(rt))))) %>%
ungroup() %>%
filter(exclusion==FALSE)
Now that we have only the rows we want, let’s add some new columns, which will contain important information for each data point. Here, we will be adding:
Ideally, I would’ve added all of these but the first when I actually created the stimuli and logged responses, but I forgot to! Luckily, R allows us to do this post-hoc fairly straightforwardly… which is good, since these features will be critical in our data visualization and analysis.
Again, some of this code is fairly ugly and involved, or irrelevant, so I’ve once again divvied it up into individual tabs, which you’re free to peruse or not.
The question under investigation here is whether or not individuals’ conceptions of gender affect how they process gendered and gender-neutral forms of English personal and professional titles.
In order to examine this, we need to quanify participants’ ideological views! Here we have adopted the 13-item Social Roles Questionnaire put forth in Baber & Tucker (2006). Questions 1-5 correlate to the ‘Gender Transcendent’ subscale, and questions 6-13 correspond to the ‘Gender Linked’ subscale. Each item is scored on a scale of 0-100. So, the first thing we want to do is make two lists of columns which correspond to these two subscales, since the questions are stored individually in the data:
gender_transcendence_cols <- c('subject_information.gender_q1','subject_information.gender_q2','subject_information.gender_q3','subject_information.gender_q4','subject_information.gender_q5')
gender_linked_cols <- c('subject_information.gender_q6','subject_information.gender_q7','subject_information.gender_q8','subject_information.gender_q9','subject_information.gender_q10','subject_information.gender_q11','subject_information.gender_q12','subject_information.gender_q13')
Now we can use the mutate() method on all_data to add two new columns, one for each subscale. We tell R to take the means of the specified columns in [column_names] of all_data for each individual row: rowMeans(all_data[column_names]).
We also have to subtract this mean from 100 in the case of the ‘Gender Transcendent’ subscale, since it is inversely scored. This is easy enough to do during the mutation step:
all_data <- all_data %>%
mutate(gender_trans = 100 - (rowMeans(all_data[gender_transcendence_cols]))) %>%
mutate(gender_link = rowMeans(all_data[gender_linked_cols]))
all_data %>%
select(workerid,rt,gender_trans,gender_link)
Finally, we probably want something that includes the average across all the gender questions, regardless of subscores. This is easy enough, since we just have to mean the two subscores we already made. So, let’s define a column list:
gender_all = c('gender_trans','gender_link')
Now we mutate a new column!
all_data <- all_data %>%
mutate(gender_total = rowMeans(all_data[gender_all]))
all_data %>%
select(workerid,rt,gender_trans,gender_link,gender_total)
We also want to add whether the trial included a female or male referent (but also, like, destroy the binary!). In order to do this, we’ll just add a trial_gender column that says ‘female’ if the condition was either ‘neutral_female’ or ‘congruent_female’. Otherwise, we want the trial_gender to say ‘male’.
all_data <- all_data %>%
mutate(trial_gender = ifelse(condition=='neutral_female' | condition == 'congruent_female','female','male'))
all_data %>%
select(workerid,rt,condition,trial_id,trial_gender)
Now we want to add whether or not the lexeme’s neutral form is developed by compounding (as in ‘congress-person’) or by the adoption of the male form (as in ‘actor’ being used more for both men and women). In this study, we only have six lexemes of the latter type, so we’ll just tell R to assign those a morph_type value of ‘adoption’ (for ‘male adoption’), and all else will be assigned a value of ‘compound’.
all_data <- all_data%>%
mutate(morph_type = ifelse(lexeme!= 'actor' & lexeme!= 'host' & lexeme !='hunter' & lexeme!= 'villain' & lexeme!= 'heir' & lexeme!= 'hero','compound','adoption'))
all_data %>%
select(rt,lexeme,morph_type)
Another important factor we want to explore is the length of the critical item! In order to add this, we simply create a new column form_length and tell R to input as that column’s value the length of the string that appears in that row’s form column, which corresponds to the orthograpic form of the critical item in that trial. Note that this will include spaces in the count!
all_data <- all_data %>%
mutate(form_length = str_length(form))
all_data %>%
select(rt,lexeme,form,form_length)
Finally, let’s make sure we have a column which records whether or not the trial was gender-congruent (as in ‘Shelby is a congresswoman’) or gender neutral (as in ‘Shelby is a congressperson’). We add a trial_congruency column, which is valued as ‘congruent’ if that row’s condition is one of the two congruent conditions. Otherwise, it gets valued as ‘neutral’.
all_data <- all_data %>%
mutate(trial_congruency = ifelse(condition=='congruent_male' | condition == 'congruent_female','congruent','neutral'))
all_data %>%
select(rt,condition,trial_congruency)
simple_model <- lm(log(rt)~form_length, data = all_data)
summary(simple_model)
Call:
lm(formula = log(rt) ~ form_length, data = all_data)
Residuals:
Min 1Q Median 3Q Max
-1.4812 -0.2896 -0.0084 0.3201 1.8782
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.997374 0.020123 298.04 <2e-16 ***
form_length 0.026594 0.002058 12.92 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4919 on 5340 degrees of freedom
Multiple R-squared: 0.03031, Adjusted R-squared: 0.03013
F-statistic: 166.9 on 1 and 5340 DF, p-value: < 2.2e-16
all_data <- all_data %>%
mutate(resid_rt = resid(simple_model))
ggplot(data=all_data, aes(x=log(rt),y=resid_rt)) +
geom_point()
all_data <- all_data %>%
mutate(poli_party = ifelse(subject_information.party_alignment == 1 | subject_information.party_alignment == 2,'Republican',ifelse(subject_information.party_alignment == 4 | subject_information.party_alignment == 5,'Democrat','Non-Partisan')))
Now we can use the head() method to check the current state of the data frame, which should include a grand total of 46 columns!
head(all_data)
And now, our data should be completely ready to visualize and analyze!
[THIS PART UNDER CONSTRUCTION: COMING SOON]
Now that we have our data ready for visualization and analysis, let’s do the former!
inauguration_2021 = c("#5445b1", "#749dae", "#f3c483", "#5c1a33", "#cd3341","#f7dc6a")
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
ggplot(all_data, aes(x=rt, fill=morph_type)) +
geom_density(alpha=.6) +
labs(x="Raw Reading Time", y="Density",fill="Critical Item Morphology Type") +
scale_fill_manual(values = cbPalette)
ggsave("plots/morph_type_density.jpg",height=5,width=8)
ggplot(all_data, aes(x=rt, fill=condition)) +
geom_density(alpha=.4) +
labs(x="Raw Reading Time", y="Density",fill="Critical Item Condition") +
scale_fill_manual(values = cbPalette)
ggplot(all_data, aes(x=rt, fill=trial_gender)) +
geom_density(alpha=.6) +
labs(x="Raw Reading Time", y="Density",fill="Critical Item Gender") +
scale_fill_manual(values = cbPalette)
ggplot(all_data, aes(x=rt, fill=trial_congruency)) +
geom_density(alpha=.6) +
labs(x="Raw Reading Time", y="Density",fill="Critical Item Congruency") +
scale_fill_manual(values = cbPalette)
ggplot(all_data, aes(x=gender_total, y=rt)) +
geom_point(alpha=.5) +
geom_smooth(method = 'lm', size=1.2)
all_data %>%
group_by(subject_information.party_alignment,condition) %>%
summarize(MeanRT = mean(rt), CI.Low = ci.low(rt), CI.High = ci.high(rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High) %>%
ggplot(aes(x=subject_information.party_alignment,y=MeanRT,fill=subject_information.party_alignment)) +
geom_bar(stat='identity') +
theme(legend.position = "none") +
geom_errorbar(aes(ymin=YMin,ymax=YMax),width=.25) +
facet_wrap(~ condition)
`summarise()` has grouped output by 'subject_information.party_alignment'. You can override using the `.groups` argument.
ggplot(all_data, aes(x=form_length,y=rt)) +
geom_point()
agr <- all_data %>%
group_by(trial_gender,trial_congruency) %>%
summarize(MeanRT = mean(rt), CI.Low = ci.low(rt), CI.High = ci.high(rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High)
`summarise()` has grouped output by 'trial_gender'. You can override using the `.groups` argument.
dodge = position_dodge(.9)
ggplot(data=agr, aes(x=trial_gender,y=MeanRT,fill=trial_congruency)) +
geom_bar(stat='identity',position=dodge) +
geom_errorbar(aes(ymin=YMin,ymax=YMax),width=.25,position=dodge)
temp <- all_data %>%
group_by(trial_gender) %>%
summarize(MeanRT = mean(rt), CI.Low = ci.low(rt), CI.High = ci.high(rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High)
dodge = position_dodge(.9)
ggplot(data=temp, aes(x=trial_gender,y=MeanRT,fill=trial_gender)) +
geom_bar(stat='identity',position=dodge) +
geom_errorbar(aes(ymin=YMin,ymax=YMax),width=.25,position=dodge) +
theme(legend.position = 'none')
all_data %>%
ggplot(aes(x=morph_type, y=rt)) +
geom_boxplot()
agg_speaker_mean_length <- all_data %>%
group_by(form_length,workerid) %>%
summarize(MeanRT=mean(rt))
`summarise()` has grouped output by 'form_length'. You can override using the `.groups` argument.
all_data %>%
group_by(form_length) %>%
summarize(MeanRT = mean(rt), CI.Low = ci.low(rt), CI.High = ci.high(rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High) %>%
ggplot(aes(x=form_length,y=MeanRT)) +
geom_col() +
geom_jitter(data = agg_speaker_mean_length, aes(y=MeanRT),alpha=.2,color='blue') +
geom_errorbar(aes(ymin=YMin,ymax=YMax), width=.25) +
geom_smooth(method = 'lm', size=1.2)
all_data %>%
ggplot(aes(x=form_length, y=rt,color=morph_type)) +
geom_jitter() +
geom_smooth(method = 'lm', size=1.2)
all_data %>%
ggplot(aes(x=log(gender_trans), y=resid_rt,color=morph_type)) +
geom_jitter() +
geom_smooth(method = 'lm', size=1.2)
all_data %>%
ggplot(aes(x=subject_information.age, y=resid_rt,color=morph_type)) +
geom_jitter() +
geom_smooth(method = 'lm', size=1.2)
all_data %>%
filter(!is.na(poli_party)) %>%
group_by(workerid,subject_information.education) %>%
ggplot(aes(x=subject_information.education)) +
geom_bar() +
facet_wrap(~poli_party)
agg_speaker_mean <- all_data %>%
group_by(morph_type,workerid) %>%
summarize(MeanRT=mean(resid_rt))
`summarise()` has grouped output by 'morph_type'. You can override using the `.groups` argument.
all_data %>%
group_by(morph_type) %>%
summarize(MeanRT = mean(resid_rt), CI.Low = ci.low(resid_rt), CI.High = ci.high(resid_rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High) %>%
ggplot(aes(x=morph_type,y=MeanRT)) +
geom_point(size=3) +
geom_jitter(data = agg_speaker_mean, aes(y=MeanRT),alpha=.2,color='blue') +
geom_errorbar(aes(ymin=YMin,ymax=YMax), width=.25)
agg_speaker_mean_con <- all_data %>%
group_by(condition,workerid) %>%
summarize(MeanRT=mean(resid_rt))
`summarise()` has grouped output by 'condition'. You can override using the `.groups` argument.
all_data %>%
group_by(condition,trial_gender) %>%
summarize(MeanRT = mean(resid_rt), CI.Low = ci.low(resid_rt), CI.High = ci.high(resid_rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High) %>%
ggplot(aes(x=condition,y=MeanRT,color=trial_gender)) +
geom_point(size=3) +
geom_jitter(data = agg_speaker_mean_con, aes(y=MeanRT),alpha=.2,color='mediumslateblue') +
geom_errorbar(aes(ymin=YMin,ymax=YMax), width=.25) +
scale_color_manual(values = cbPalette)
`summarise()` has grouped output by 'condition'. You can override using the `.groups` argument.
all_data %>%
group_by(condition,trial_gender,trial_congruency,lexeme) %>%
summarize(MeanRT = mean(resid_rt), CI.Low = ci.low(resid_rt), CI.High = ci.high(resid_rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High) %>%
ggplot(aes(x=condition,y=MeanRT,color=trial_gender,shape=trial_congruency)) +
geom_point(size=3) +
geom_errorbar(aes(ymin=YMin,ymax=YMax), width=.25) +
facet_wrap(~ lexeme) +
theme(axis.text.x = element_text(angle = 45, vjust = .7, hjust=.7)) +
scale_color_manual(values = cbPalette)
`summarise()` has grouped output by 'condition', 'trial_gender', 'trial_congruency'. You can override using the `.groups` argument.
temp <- all_data %>%
group_by(lexeme,trial_gender,trial_congruency) %>%
summarize(meanRT = mean(resid_rt)) %>%
spread(trial_congruency,meanRT) %>%
mutate(con_dif = neutral-congruent)
`summarise()` has grouped output by 'lexeme', 'trial_gender'. You can override using the `.groups` argument.
ggplot(temp, aes(x=trial_gender,y=con_dif, fill=trial_gender)) +
geom_bar(stat='identity') +
theme(legend.position = 'none') +
geom_hline(yintercept =0) +
facet_wrap(~lexeme) +
labs(x = 'Trial Gender', y='Congruency Difference (Neutral-Congruent)')
all_data %>%
filter(lexeme=='firefighter') %>%
group_by(trial_gender) %>%
summarize(MeanRT = mean(resid_rt), CI.Low = ci.low(resid_rt), CI.High = ci.high(resid_rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High) %>%
ggplot(aes(x=trial_gender,y=MeanRT,color=trial_gender)) +
geom_point(size=3) +
geom_errorbar(aes(ymin=YMin,ymax=YMax), width=.25)
ggplot(all_data, aes(x=trial_gender,y=resid_rt)) +
geom_bar(stat='identity') +
facet_wrap(~lexeme)
ggsave("plots/difference-plot.jpg",height=5,width=8)
all_data %>%
filter(!is.na(poli_party)) %>%
group_by(poli_party,condition,trial_gender,trial_congruency) %>%
summarize(MeanRT = mean(resid_rt), CI.Low = ci.low(resid_rt), CI.High = ci.high(resid_rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High) %>%
ggplot(aes(x=condition,y=MeanRT,color=trial_gender,shape=trial_congruency)) +
geom_point(size=3) +
geom_errorbar(aes(ymin=YMin,ymax=YMax), width=.25) +
facet_wrap(~ poli_party, nrow = 1) +
theme(axis.text.x = element_text(angle = 45, vjust = .7, hjust=.7)) +
scale_color_manual(values = cbPalette)
`summarise()` has grouped output by 'poli_party', 'condition', 'trial_gender'. You can override using the `.groups` argument.
all_data %>%
filter(!is.na(subject_information.party_alignment)) %>%
group_by(subject_information.party_alignment,condition,trial_gender,trial_congruency) %>%
summarize(MeanRT = mean(resid_rt), CI.Low = ci.low(resid_rt), CI.High = ci.high(resid_rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High) %>%
ggplot(aes(x=condition,y=MeanRT,color=trial_gender,shape=trial_congruency)) +
geom_point(size=3) +
geom_errorbar(aes(ymin=YMin,ymax=YMax), width=.25) +
facet_wrap(~ subject_information.party_alignment, nrow = 1) +
theme(axis.text.x = element_text(angle = 45, vjust = .7, hjust=.7)) +
scale_color_manual(values = cbPalette)
`summarise()` has grouped output by 'subject_information.party_alignment', 'condition', 'trial_gender'. You can override using the `.groups` argument.
aggr_speaker <- all_data %>%
group_by(gender_link,workerid,trial_gender,trial_congruency) %>%
summarise(meanrt = mean(resid_rt))
`summarise()` has grouped output by 'gender_link', 'workerid', 'trial_gender'. You can override using the `.groups` argument.
aggr_speaker %>%
ggplot(aes(x=gender_link,y=meanrt,color=trial_gender,linetype=trial_congruency)) +
geom_point() +
geom_smooth(method='lm')
agg_speaker_mean_gen <- all_data %>%
group_by(trial_gender,workerid) %>%
summarize(MeanRT=mean(resid_rt))
`summarise()` has grouped output by 'trial_gender'. You can override using the `.groups` argument.
all_data %>%
group_by(trial_gender) %>%
summarize(MeanRT = mean(resid_rt), CI.Low = ci.low(resid_rt), CI.High = ci.high(resid_rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High) %>%
ggplot(aes(x=trial_gender,y=MeanRT)) +
geom_point(size=3) +
geom_jitter(data = agg_speaker_mean_gen, aes(y=MeanRT),alpha=.4,color='mediumslateblue') +
geom_errorbar(aes(ymin=YMin,ymax=YMax), width=.25)
all_data %>%
filter(!is.na(poli_party)) %>%
group_by(trial_congruency,poli_party) %>%
summarize(MeanRT = mean(resid_rt), CI.Low = ci.low(resid_rt), CI.High = ci.high(resid_rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High) %>%
ggplot(aes(x=trial_congruency,y=MeanRT)) +
geom_point(size=3) +
geom_errorbar(aes(ymin=YMin,ymax=YMax), width=.25) +
facet_wrap(~poli_party)
`summarise()` has grouped output by 'trial_congruency'. You can override using the `.groups` argument.
all_data %>%
filter(!is.na(poli_party)) %>%
group_by(trial_gender,poli_party) %>%
summarize(MeanRT = mean(resid_rt), CI.Low = ci.low(resid_rt), CI.High = ci.high(resid_rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High) %>%
ggplot(aes(x=trial_gender,y=MeanRT)) +
geom_point(size=3) +
geom_errorbar(aes(ymin=YMin,ymax=YMax), width=.25) +
facet_wrap(~poli_party)
`summarise()` has grouped output by 'trial_gender'. You can override using the `.groups` argument.
poli_data <- all_data %>%
group_by(workerid) %>%
summarise(party = paste(unique(poli_party)))
table(poli_data$party)
Democrat NA Non-Partisan Republican
113 2 60 103
poli_data_gran <- all_data %>%
group_by(workerid) %>%
summarise(party = paste(unique(subject_information.party_alignment)))
table(poli_data_gran$party)
1 2 3 4 5 NA
27 76 60 60 53 2
all_data <- all_data %>%
mutate(ctrial_congruency = as.numeric(as.factor(trial_congruency))-mean(as.numeric(as.factor(trial_congruency)))) %>%
mutate(ctrial_gender = as.numeric(as.factor(trial_gender))-mean(as.numeric(as.factor(trial_gender)))) %>%
mutate(cgender_link = scale(gender_link))
complex_model <- lmer(resid_rt~ctrial_congruency*ctrial_gender*cgender_link + (1|workerid) + (1|lexeme) + (1|name),data = all_data,control=lmerControl(optCtrl=list(maxfun=40000)))
summary(complex_model)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: resid_rt ~ ctrial_congruency * ctrial_gender * cgender_link +
(1 | workerid) + (1 | lexeme) + (1 | name)
Data: all_data
Control: lmerControl(optCtrl = list(maxfun = 40000))
REML criterion at convergence: 3821
Scaled residuals:
Min 1Q Median 3Q Max
-3.9473 -0.6006 -0.0489 0.5240 4.9953
Random effects:
Groups Name Variance Std.Dev.
workerid (Intercept) 0.149388 0.38651
name (Intercept) 0.000236 0.01536
lexeme (Intercept) 0.001422 0.03771
Residual 0.098718 0.31419
Number of obs: 5342, groups: workerid, 278; name, 24; lexeme, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 1.989e-03 2.524e-02 2.558e+02 0.079 0.93724
ctrial_congruency -2.719e-03 8.622e-03 5.041e+03 -0.315 0.75247
ctrial_gender -3.443e-02 1.067e-02 2.215e+01 -3.227 0.00385 **
cgender_link -5.114e-02 2.359e-02 2.740e+02 -2.168 0.03101 *
ctrial_congruency:ctrial_gender 4.703e-02 1.724e-02 5.040e+03 2.728 0.00640 **
ctrial_congruency:cgender_link -1.512e-03 8.634e-03 5.048e+03 -0.175 0.86101
ctrial_gender:cgender_link -5.162e-03 8.626e-03 5.031e+03 -0.598 0.54955
ctrial_congruency:ctrial_gender:cgender_link 7.885e-03 1.726e-02 5.045e+03 0.457 0.64776
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) ctrl_c ctrl_g cgndr_ ctrl_cngrncy:ct_ ctrl_cngrncy:cg_ ctrl_g:_
ctrl_cngrnc 0.000
ctrial_gndr 0.000 -0.002
cgender_lnk 0.000 0.000 -0.001
ctrl_cngrncy:ct_ 0.000 0.000 0.000 0.001
ctrl_cngrncy:cg_ 0.000 -0.001 0.003 0.000 -0.005
ctrl_gndr:_ -0.001 0.003 0.001 -0.001 0.003 -0.001
ctrl_cn:_:_ 0.000 -0.005 0.003 0.000 -0.001 -0.001 -0.001
plot(complex_model)
complex_model_bare <- lmer(resid_rt~trial_congruency*trial_gender + (1 + trial_congruency + trial_gender | workerid) + (1|lexeme) + (1|name),data = all_data,control=lmerControl(optCtrl=list(maxfun=20000)))
Model failed to converge with max|grad| = 0.00547368 (tol = 0.002, component 1)
summary(complex_model_bare)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: resid_rt ~ trial_congruency * trial_gender + (1 + trial_congruency +
trial_gender | workerid) + (1 | lexeme) + (1 | name)
Data: all_data
Control: lmerControl(optCtrl = list(maxfun = 20000))
REML criterion at convergence: 3794.4
Scaled residuals:
Min 1Q Median 3Q Max
-3.9481 -0.5975 -0.0524 0.5258 5.0348
Random effects:
Groups Name Variance Std.Dev. Corr
workerid (Intercept) 0.1557168 0.39461
trial_congruencyneutral 0.0012882 0.03589 0.00
trial_gendermale 0.0024540 0.04954 -0.22 -0.88
name (Intercept) 0.0002237 0.01496
lexeme (Intercept) 0.0014150 0.03762
Residual 0.0977015 0.31257
Number of obs: 5342, groups: workerid, 278; name, 24; lexeme, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.03257 0.02691 270.59332 1.210 0.22715
trial_congruencyneutral -0.02639 0.01236 930.63983 -2.135 0.03304 *
trial_gendermale -0.05809 0.01392 64.26563 -4.172 9.23e-05 ***
trial_congruencyneutral:trial_gendermale 0.04719 0.01715 4509.61144 2.751 0.00597 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) trl_cn trl_gn
trl_cngrncy -0.222
tril_gndrml -0.290 0.397
trl_cngrn:_ 0.161 -0.699 -0.618
optimizer (nloptwrap) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.00547368 (tol = 0.002, component 1)
plot(complex_model)
complex_model_antirandom <- lm(resid_rt~ctrial_congruency*ctrial_gender*cgender_link, data = all_data)
plot(complex_model_antirandom)
no_genderless <- all_data[complete.cases(all_data), ]
gender_check <- lm(gender_total~subject_information.gender*poli_party, data=no_genderless)
summary(gender_check)
Call:
lm(formula = gender_total ~ subject_information.gender * poli_party,
data = no_genderless)
Residuals:
Min 1Q Median 3Q Max
-31.805 -9.193 -0.716 6.994 48.282
Coefficients: (4 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.7206 2.3899 6.996 2.96e-12 ***
subject_information.genderFemale -4.2548 2.4184 -1.759 0.07858 .
subject_information.genderMale 4.8015 2.3491 2.044 0.04101 *
subject_information.genderOther -7.5833 2.9816 -2.543 0.01101 *
poli_partyNon-Partisan 12.0707 0.7416 16.276 < 2e-16 ***
poli_partyRepublican 16.4216 0.5906 27.803 < 2e-16 ***
subject_information.genderFemale:poli_partyNon-Partisan -0.7917 0.9746 -0.812 0.41665
subject_information.genderMale:poli_partyNon-Partisan NA NA NA NA
subject_information.genderOther:poli_partyNon-Partisan NA NA NA NA
subject_information.genderFemale:poli_partyRepublican -2.4439 0.8370 -2.920 0.00352 **
subject_information.genderMale:poli_partyRepublican NA NA NA NA
subject_information.genderOther:poli_partyRepublican NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 13.1 on 5255 degrees of freedom
Multiple R-squared: 0.331, Adjusted R-squared: 0.3301
F-statistic: 371.5 on 7 and 5255 DF, p-value: < 2.2e-16
plot(gender_check)
5.2 Comments